Skip to content

Added seed database ingestion helper#2010

Merged
dwnoble merged 3 commits into
datacommonsorg:masterfrom
dwnoble:seed-db
May 14, 2026
Merged

Added seed database ingestion helper#2010
dwnoble merged 3 commits into
datacommonsorg:masterfrom
dwnoble:seed-db

Conversation

@dwnoble
Copy link
Copy Markdown
Contributor

@dwnoble dwnoble commented May 14, 2026

This pull request introduces a new seed_database action to the ingestion helper service, which seeds the Spanner database with essential base nodes required by the Data Commons schema. The implementation includes the new action handler, the underlying logic in the Spanner client, and comprehensive tests to ensure correct behavior.

New Feature: Database Seeding

  • Added a seed_database action to the ingestion helper API, which seeds the Spanner database with base nodes such as StatisticalVariable, StatVarGroup, StatVarObservation, Topic, and c/g/Root. This action ensures that the database contains the minimum required schema nodes for Data Commons operations. [1] [2] [3]

Testing

  • Implemented unit tests in main_test.py to verify that the seed_database action is handled correctly and that the Spanner client’s seed_database method is called as expected.
  • Added tests in spanner_client_test.py to ensure that the seed_database method inserts base nodes when missing and does not insert duplicates if the nodes already exist.

Documentation

  • Updated the README.md to document the new seed_database action, including its purpose and usage.

@dwnoble dwnoble requested a review from gmechali May 14, 2026 00:26
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new seed_database action to the ingestion helper, which populates the Spanner database with essential base nodes for the Data Commons schema. The implementation includes updates to the README, a new handler in main.py, and the core logic in spanner_client.py, supported by new unit tests. A review comment suggests refactoring the seed_database method to improve maintainability by deriving the subject list from the keys of the candidates dictionary, thereby avoiding redundancy.

Comment thread import-automation/workflow/ingestion-helper/spanner_client.py Outdated
@dwnoble dwnoble requested a review from clincoln8 May 14, 2026 00:33
Copy link
Copy Markdown
Contributor

@gmechali gmechali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Dan, LGTM!

TBH I'm not sure if we're missing some of the nodes required. I have a doubt on whether we need dc/g/Root - very liekly but not sure. But we can add it later if confirmed.

Comment thread import-automation/workflow/ingestion-helper/main.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@dwnoble
Copy link
Copy Markdown
Contributor Author

dwnoble commented May 14, 2026

thanks Dan, LGTM!

TBH I'm not sure if we're missing some of the nodes required. I have a doubt on whether we need dc/g/Root - very liekly but not sure. But we can add it later if confirmed.

Thanks Gabe! And that sounds good. we can modify this list as we go

@dwnoble dwnoble enabled auto-merge (squash) May 14, 2026 18:43
@dwnoble dwnoble merged commit 258e176 into datacommonsorg:master May 14, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants